Evaluation of window cohabitation of DNA sequencing errors and lowest PHRED quality values.
نویسندگان
چکیده
When analyzing sequencing reads, it is important to distinguish between putative correct and wrong bases. An open question is how a PHRED quality value is capable of identifying the miscalled bases and if there is a quality cutoff that allows mapping of most errors. Considering the fact that a low quality value does not necessarily indicate a miscalled position, we decided to investigate if window-based analyses of quality values might better predict errors. There are many reasons to look for a perfect window in DNA sequences, such as when using SAGE technique, looking for BLAST seeding and clustering sequences. Thus, we set out to find a quality cutoff value that would distinguish non-perfect windows from perfect ones. We produced and compared 846 reads of pUC18 with the published pUC consensus, by local alignment. We then generated a database containing all mismatches, insertions and gaps in order to map real perfect windows. An investigation was made to find the potential to predict perfect windows when all bases in the window show quality values over a given cutoff. We conclude that, in window-based applications, a PHRED quality value cutoff of 7 masks most of the errors without masking real correct windows. We suggest that the putative wrong bases be indicated in lower case, increasing the information on the sequence databases without increasing the size the files.
منابع مشابه
Estimation of errors in "raw" DNA sequences: a validation study.
As DNA sequencing is performed more and more in a mass-production-like manner, efficient quality control measures become increasingly important for process control, but so also does the ability to compare different methods and projects. One of the fundamental quality measures in sequencing projects is the position-specific error probability at all bases in each individual sequence. Accurate pre...
متن کاملBenchmark for evaluating the quality of DNA sequencing: proposal from an international external quality assessment scheme.
BACKGROUND In the past 15 years, clinical laboratory science has been transformed by the use of technologies that cross the traditional boundaries between laboratory disciplines. However, during this period, issues of quality have not always been given adequate attention. The European Molecular Genetics Quality Network (EMQN) has developed a novel external quality assessment scheme for evaluati...
متن کاملDna Sequences Base Calling by Phred: Error Pattern Analysis
PHRED is the most frequently used base caller algorithm in genome projects. An interesting point on PHRED utilization is the fact that a low score on some base may not actually correspond to a miscalling on that base, but it may stand for a putative error on the region around this base. In order to evaluate the efficiency of PHRED on base calling and base quality assigning, we have sequenced pU...
متن کاملGenomic DNA preparation enabling multiple replicate reads for accurate nanopore sequencing
Sequencing at single-nucleotide resolution using nanopore devices is performed with reported error rates 10.5-20.7% (Ip et al., 2015). Since errors occur randomly during sequencing, repeating the sequencing procedure for the same DNA strands several times can generate sequencing results based on consensus derived from replicate readings, thus reducing overall error rates. The method presented i...
متن کاملDNA sequencing reads and variants calling using mapping quality scores ( Supplementary Text )
In this supplement text, a letter in uppercase indicates a random variable, whereas a letter in lowercase represents a constant, a known value or a function. Let Σ = {‘A’,‘C’,‘G’,‘T’} be the alphabet of the four nucleotides. In sequencing, the true nucleotide is B ∈ Σ and the one estimated by base caller is B̂. The base error B is defined as: B = Pr{B̂ 6= B} and base quality QB is: QB = −c log B ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genetics and molecular research : GMR
دوره 3 4 شماره
صفحات -
تاریخ انتشار 2004